92 research outputs found

    A Reactive and Cycle-True IP Emulator for MPSoC Exploration

    Get PDF
    The design of MultiProcessor Systems-on-Chip (MPSoC) emphasizes intellectual-property (IP)-based communication-centric approaches. Therefore, for the optimization of the MPSoC interconnect, the designer must develop traffic models that realistically capture the application behavior as executing on the IP core. In this paper, we introduce a Reactive IP Emulator (RIPE) that enables an effective emulation of the IP-core behavior in multiple environments, including bitand cycle-true simulation. The RIPE is built as a multithreaded abstract instruction-set processor, and it can generate reactive traffic patterns. We compare the RIPE models with cycle-true functional simulation of complex application behavior (tasksynchronization, multitasking, and input/output operations). Our results demonstrate high-accuracy and significant speedups. Furthermore, via a case study, we show the potential use of the RIPE in a design-space-exploration context

    Improving the Fault Tolerance of Nanometric PLA Designs

    Get PDF
    Several alternative building blocks have been proposed to replace planar transistors, among which a prominent spot belongs to nanometric laments such as Silicon NanoWires (SiNWs) and Carbon NanoTubes (CNTs). However, chips leveraging these nanoscale structures are expected to be affected by a large amount of manufacturing faults, way beyond what chip architects have learned to counter. In this paper, we show a design ow, based on software mapping algorithms, to improve the yield of nanometric Programmable Logic Arrays (PLAs). While further improvements to the manufacturing technology will be needed to make these devices fully usable, our ow can signi cantly shrink the gap between current and desired yield levels. Also, our approach does not need post-fabrication functional analysis and mapping, therefore dramatically cutting on veri cation costs. We check PLA yields by means of an accurate analyzer after Monte Carlo fault injection. We show that, compared to a baseline policy of wire replication, we achieve equal or better yields (8% over a set of designs) depending on the underlying defect assumptions

    Characterization and Implementation of Fault-Tolerant Vertical Links for 3-D Networks-on-Chip

    Get PDF
    Through silicon vias (TSVs) provide an efficient way to support vertical communication among different layers of a vertically stacked chip, enabling scalable 3-D networks-on-chip (NoC) architectures. Unfortunately, low TSV yields significantly impact the feasibility of high-bandwidth vertical connectivity. In this paper, we present a semi-automated design flow for 3-D NoCs including a defect-tolerance scheme to increase the global yield of 3-D stacked chips. Starting from an accurate physical and geometrical model of TSVs: 1) we extract a circuit-level model for vertical interconnections; 2) we use it to evaluate the design implications of extending switch architectures with ports in the vertical direction; moreover, 3) we present a defect-tolerance technique for TSV-based multi-bit links through an effective use of redundancy; and finally, 4) we present a design flow allowing for post-layout simulation of NoCs with links in all three physical dimensions. Experimental results show that a 3-D NoC implementation yields around 10% frequency improvement over a 2-D one, thanks to the propagation delay advantage of TSVs and the shorter links. In addition, the adopted fault tolerance scheme demonstrates a significant yield improvement, ranging from 66% to 98%, with a low area cost (20.9% on a vertical link in a NoC switch, which leads a modest 2.1% increase in the total switch area) in 130 nm technology, with minimal impact on very large-scale integrated design and test flows

    Apodization Scheme for Hardware-Efficient Beamformer

    Get PDF
    3D ultrasound is an emerging diagnostic technique that extends standard ultrasound imaging by capturing volumes, instead of planes. This brings completely new diagnostic opportunities, among which the possibility of disjoining image acquisition and analysis, thus enabling remote diagnosis, which would bring obvious medical and economic benefits. Unfortunately, 3D ultrasound is several orders of magnitude more computationally complex than 2D imaging. Therefore, algorithmic improvements to simplify the processing are mandatory in order to conceive cheap, portable, low-power imagers. The kernel of the 3D imaging process, called beamforming, consists essentially of computing delay and apodization profiles. We have previously devised an approximation of the delay calculation stage, which dramatically reduces hardware complexity. Unfortunately, this approximation introduces an intrinsic degree of inaccuracy that can be characterized as added image noise. In this paper, we identify an efficient approximated approach to the calculation of apodization profiles, that additionally minimizes (-76%) the error introduced during delay calculation. Together, these two techniques enable an efficient computation of 3D ultrasound images

    Tackling the Bottleneck of Delay Tables in 3D Ultrasound Imaging

    Get PDF
    3D ultrasound imaging is quickly becoming a refer- ence technique for high-quality, accurate, expressive diagnostic medical imaging. Unfortunately, its computation requirements are huge and, today, demand expensive, power-hungry, bulky processing resources. A key bottleneck is the receive beamforming operation, which requires the application of many permutations of fine-grained delays among the digitized received echoes. To apply these delays in the digital domain, in principle large tables (billions of coefficients) are needed, and the access bandwidth to these tables can reach multiple TB/s, meaning that their storage both on-chip and off-chip is impractical. However, smarter implementations of the delay generation function, including forgoing the tables altogether, are possible. In this paper we explore efficient strategies to compute the delay function that controls the reconstruction of the image, and present a feasibility analysis for an FPGA platform

    Bringing NoCs to 65nm

    Get PDF
    Very deep submicron process technologies are ideal application fields for NoCs, which offer a promising solution to the scalability problem. This article sheds light on the benefits and challenges of Noc-Based interconnect design in nanometer CMOS. The author present experimental results from fully working 65-NM Noc Designs and a detailed scalability analysis

    Networks on Chips: From Research to Products

    Get PDF
    Research on Networks on Chips (NoCs) has spanned over a decade and its results are now visible in some products. Thus the seminal idea of using networking technology to address the chip-level interconnect problem has been shown to be correct. Moreover, as technology scales down in geometry and chips scale up in complexity, NoCs become the essential element to achieve the desired levels of performance and quality of service while curbing power consumption levels. Design and timing closure can only be achieved by a sophisticated set of tools that address NoC synthesis, optimization and validation

    Computing Accurate Performance Bounds for Best Effort Networks-on-Chip

    Get PDF
    Real-time (RT) communication support is a critical requirement for many complex embedded applications which are currently targeted to Network-on-chip (NoC) platforms. In this paper, we present novel methods to efficiently calculate worst- case bandwidth and latency bounds for RT traffic streams on wormhole-switched NoCs with arbitrary topology. The proposed methods apply to best-effort NoC architectures, with no extra hardware dedicated to RT traffic support. By applying our methods to several realistic NoC designs, we show substantial improvements (more than 30% in bandwidth and 50% in latency, on average) in bound tightness with respect to existing approaches

    Comparison of a Timing-Error Tolerant Scheme with a Traditional Re-transmission Mechanism for Networks on Chips

    Get PDF
    On-chip wires are becoming unreliable as the effect of various noise sources increases with technology scaling. This leads to unpredictable timing delay variations on the interconnect wires. There is a significant need to mitigate the effect of parasitics on the interconnects, while keeping performance and area overheads at a minimum. In this work, we present a timing error tolerant design methodology, T-error, that provides dynamic recovery from timing delay variations on the interconnects. We validate the functionality of the T-error methodology using cycle-accurate RTL models of a Network-on-Chip (NoC) design, that are integrated onto a multiprocessor virtual platform. Our comparisons with the state-of-the-art error recovery mechanisms show that the T-error system provides error recovery with higher performance than the existing schemes. We also present the synthesis results for the T-error scheme, which show that the scheme has negligible overhead
    • …
    corecore